{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pandas 进阶修炼 ｜早起Python\n",
    "<br>\n",
    "\n",
    "**本习题由公众号【早起Python & 可视化图鉴】 原创，转载及其他形式合作请与我们联系（微信号`sshs321`)，未经授权严禁搬运及二次创作，侵权必究！**\n",
    "\n",
    "\n",
    "\n",
    "本习题基于 `pandas` 版本 `1.1.3`，所有内容应当在 `Jupyter Notebook` 中执行以获得最佳效果。\n",
    "\n",
    "不同版本之间写法可能会有少许不同，如若碰到此情况，你应该学会如何自行检索解决。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4 - 数据统计描述性分析\n",
    "\n",
    "\n",
    "<br>\n",
    "\n",
    "\n",
    "在上一章完成基本的数据预览以及缺失值和重复值的处理后。\n",
    "\n",
    "下一个步骤就是对数据进行简单的统计描述性分析，进一步观察数据特征。\n",
    "\n",
    "本章就整理了部分常见操作进行练习。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 初始化\n",
    "\n",
    "<br>\n",
    "\n",
    "该 `Notebook` 版本为**纯习题版**\n",
    "\n",
    "如果需要答案或者提示，可以微信搜索公众号「早起Python」获取！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_excel(\"2020年中国大学排名.xlsx\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据探索"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1 - 查看数据\n",
    "\n",
    "<br>\n",
    "\n",
    "查看数据前 10 行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2 - 修改索引\n",
    "\n",
    "<br>\n",
    "\n",
    "数据已经按照降序排列，让 学校 当索引会更好一点\n",
    "\n",
    "-> 修改索引为 学校名称 列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3 - 查看数据量\n",
    "\n",
    "也就是数据框的 行 * 列，总共单元格的数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4 - 数据排序\n",
    "\n",
    "<br>\n",
    "\n",
    "将数据按照总分升序排列，并展示前20个学校\n",
    "\n",
    "备注：也就是看倒数20名啦"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5 - 数据排序\n",
    "\n",
    "将数据按照 高端人才得分 降序排序，展示前 10 位"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6 - 分列排名\n",
    "\n",
    "<br>\n",
    "\n",
    "查看各项得分最高的学校名称"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7 - 统计信息｜均值\n",
    "\n",
    "计算总分列的均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 8 - 统计信息｜中位数\n",
    "\n",
    "<br>\n",
    "\n",
    "计算总分列的中位数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 9 - 统计信息｜众数\n",
    "\n",
    "\n",
    "计算总分列的众数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 10 -统计信息｜部分\n",
    "\n",
    "计算 总分、高端人才得分、办学层次得分的最大最小值、中位数、均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 11 - 统计信息｜完整\n",
    "\n",
    "<br>\n",
    "\n",
    "查看数值型数据的统计信息（均值、分位数等），并保留两位小数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 12 - 统计信息｜分组\n",
    "\n",
    "计算各省市总分均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 13 - 统计信息｜相关系数\n",
    "\n",
    "<br>\n",
    "\n",
    "也就是相关系数矩阵，也就是每两列之间的相关性系数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 14 - 相关系数｜热力图\n",
    "\n",
    "<br>\n",
    "\n",
    "将上一题的相关性系数矩阵制作为热力图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 15 - 统计信息｜频率\n",
    "\n",
    "计算各省市出现的次数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "微信搜索公众号「早起Python」，关注后可以获得更多资源！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 16 - 统计信息｜热力地图\n",
    "\n",
    "结合 `pyecharts` 将各省市高校上榜数量进行地图可视化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 17 - 统计信息｜直方图\n",
    "\n",
    "绘制总分的直方图、密度估计图"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 个 pandas EDA 插件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 pandas 之外，还有两个插件可以快速实现 EDA\n",
    "\n",
    "下面不作为习题，仅供介绍，感兴趣可以进一步搜索了解\n",
    "\n",
    "执行全部代码即可获得 EDA 报告！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 18 - pandas_profiling\n",
    "\n",
    "<br>\n",
    "\n",
    "如果没有提前安装 `pandas_profiling` 的话，需要提前 `pip` 进行安装"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install pandas_profiling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas_profiling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [],
   "source": [
    "pandas_profiling.ProfileReport(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 19 - sweetviz\n",
    "\n",
    "如果没有提前安装 `sweetviz` 的话，需要提前 `pip` 进行安装"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install sweetviz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sweetviz as sv\n",
    "\n",
    "report = sv.analyze(df)\n",
    "report.show_html()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行完上面的代码后，当前目录下会出现一个html文件，打开即可看到相关 EDA 报告"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](http://liuzaoqi.oss-cn-beijing.aliyuncs.com/2021/09/16/16317972442543.jpg?域名/sample.jpg?x-oss-process=style/stylename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
